


MASSACHUSETTS INSTITUTE OF TECHNOLOGY 
ARTIFICIAL INTELLIGENCE LABORATORY 



Al. M»mO Nb, 3^0 December 1975 

EARLY PROCESSING OF VISUAL INFORMATION 

■ ► 
- . . ■ ■ 

■ 

i 

;. - ■ 

by 



D, Marr 



: 



ABSTRACT. The arSiels dfintribes ? symbolic approach 16 visual information processing,, and 
ebU out four principle? ]h*| frppoar to %0vnrr\ Ihe dEsijn Of compFBH symbolic information 
processing syste-mJ. A com pu la t ions! theory nF early visual lnFofm*trr>n praessslng is 
pre*4ti|(nH F which extends to about Ihe level of Fijuffi -ground separation. It includes b 
prae^^s-oriflntect \hsCry ol textu'e vision. Most of [h? lhpcwy has beer* implemented 1 , and 
exampFa-s are sho^rt erf thfl ?ng:1ysic Of several natural injages. "Ffwi replaces Mamoi 3g4 ?nj 
334. 



This report describes rcsoarcN done at the Artificial Intelligence Laboratory of the. 
Massaehusetls Institute e-1 "ecHncilogy. 'Support for the laboratory's- artificial i ntelFiEene* 
research is provided in part by Ihe Advanced Research Projects Agency oF Erie Department ol 
OolonsB under Office Of Naval Research contract NQ.QO 1 4-7S -C-Q$43 r 
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L An inl reduction is given Jo a th&qry of early visual information processing. The theory h« 
been implemented, -a^d examples are given of images »t various stages 0t analysis. 

2. It is argued that (ha lirst step of consequence is tn compute a primltlv* but fBc*l description 
Of the gray -level change present in an image. The description it txpre&ted ill i vocabulary, 
of Kinds of intensity change {EDGE, SHAPING-EDGE, EXTENMD-EOGE, LINE. BLOB *1c.\ 
Modifying parameter* are bound to the etements in fha description, Specifying their POSITION 
QRlENTATiQty TERMINATION points, COWRAST^ SIZE and PUZI1NES5. This description I* 
obtained from the intensity n ray by fired tBchroques N and il is called 1he primal tkntvkr 

3. For rnosl Images, (ha primal s1teleh is large and unwieldy. The second important step in 
Visual information pr« B3 sing is to group its, contents In a way that Is appropriate for later 
recognition. 

4. From Our ability IP interpret drawings with lillle semantic conlentj on* may infer [he. 
presence rn cur perceptual equipmnnt Ol symbolic pr0c€5ses thai can define "place-lokerts" in 

an Imago in various ways, and can group Ihsm according to certain rufes. rto mO mOrp hi c 
techniques fail to accOunl for many oF these firnnpi-ftj phenomena, whose eiplanations require- 
rr-Bchapisms of construction rather th?n mechanisms oF detedion. 

5. The nocessary grouping of elements In the primail scotch may be achieved by a mechanism 
1 that has avertable the processes infantrd from (4^ legal her wilh the ability to s erect ilems by 

lirsl-orcFer discriminations acting on the elements* parameters. Only occasionally do these, 
mechanisms use downward-flowing mfermatiOn aboul Ihe contents ol the particular image 
be-itifi processed. 

S L It is argued that "nc-n-*l(ontive" vision Is in practice implemented by these gmuptng 
Operations, and First-ordo* diicri mi nations acting tin the primal sketch. The class of 
compulations SO obtained differs slightly Irom Ihe class dF second-order operations on the 
intensity array. 



7. The extraction of a form from the primal slsetch using Ihese techniques amounts to the 
separation of figure From ground. II is concluded (hat most of the separation can be carried 
out using techniques that do not depend upon the particular image In question. Therefore, 
f'liLirtf-E'-oijnd separation can normafly pftctit the description ol the shape of the extracted 
form. 

6. Up to this pOint h higher-level knowledge and purpose are brought to hear on only a few of 
the decisions taken during the processing. This relegates the widespread use of downward- 
f towing Information to a later stage than is found in current itiachine-"vrsion programs,, and 
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rnip|ift& Ihst iuehj knowledge $ha-..ld inlluenci fch* control flfj Filler than inlsrfermg wtfy 1h« 
actual ctafca-prOso&sing that it tsking pise* lower down. 
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The vision problem begins with a large gray-level intensity array, and 
culminates in a description Iha| depends on lhat array , and On the purpose for which it is 
being ^i*v,rftd. The question of interest I? what has to go on In bttwtin. This artltlt outlines 
the first part of a theory of visual information processing, *nd covers (he an-alysis up to about 
the leye-1 qf figure-ground separation. The theory is ret I rioted I'd single frarn*, 
mOnOchrOnn atFc j monocular ifTtJ-g.au without specularities, rejections,! translucency, transparency 
or %ht souses. It is argued that lha first step of ■consequence is to compute a primitive bul 
rich description Of lha gray-level ?hangea prostnt fn an Imajjt, and that ill subsequent 
computations are impremented as manipulaTions of lhat description. The description itseM is 
called the Primal Sketch, Tha |pi-Ot*sses thaj c&mpule it, and most Of the processes thai 
Operale directly on it, do not depend slgnilicantly upon the particular contents of tha image. 
The ccmlrol of these proee&s*-* may, 

Ths approach taken hero rests Upon fht obttrvatiort tbat a drawing, of a scene 
adequately represents lha scene, d«plt« the very ditfefrnl %*ay-\vvv\ image to wHcb it gives 
rise. II therefore seems re a sprite to suppose that lha artist's local symbols are in* 
correspondence with natural symbols, that are computed Out tit the image during the normal 
course- of tts Inturprctat'On. The idea that visual processing should commence with the 
extraction of a mors- or less elaborate line -drawing is not a new One, but its successful 
implamanl alien has proved elusive. Several edee-detection algorithms havo been proposed 
(Huectel (1971 ft 1973), MsCked (1970), floscnfuld ft Thurston (197 1 \ Poscnfold, Thurston * 
Lse, {]3 72 J, Horn (1973)}, but as. their prc-liFut gtian svp.Rmj'j;, the results of applying trip-m to 
natural images have proved generally unsalisfattary. This hat led tome to believe that an 
adequate line-drawing Of a scene cannot be computed unless hypotheses about whal Is 
present are alrowed to influence quite early stages in the processing ($h«rai 1973 t Freuder 
1975X 

How much independent pre-processing can userully be carried out? Do the 
cM'ccnl str^es i - 'ccogn \-^p v ave "3 nlsr^cl in a rich an£ can^lex way, or ma/ they be 
irnplemenled in modules that are to a first approximation independent? fhese questions do 
not depend upon lha particular Hardware (wet or dry) in -uhith the processing is implemented,. 
vVe need to answer them colore we can address "higher -level" problems,, because the nature 
of the answers determines the Overall strategy that subsequent processes must employ. 

Crnrrtil principle 

Several lessons have been learnt over the last ten years from Itie experlertce 
of designing and implemenling largB symbolic computer programs. These lessons may be 
en pressed as four principles for the Organization of compter symbolic processes. Because I 
shall need to reter to them, and because recognition and olbor advanced biological 
computations are cOmple* symbolic prOccsseSj I tafcc I he liberty ol setting out tbese 
principles here. 
1: F.-i.-i: :.i-.i'r t-f • i ;■ ! it ,' ' tta i i i'.i .- 

Whenever a collection of dala is to be described, discussed or mampulsited as a 






whole, il should first he given a mmt, This forms the dala into in entity in its. own righlj 
permits properties Eo he assigned to it, and allows olher struclures and processes to refer to 
it The act of naming is the distinguishing nark of synnbalm computation, and this (nsighrt waa 
the sii^glsh most important ids a behind It™ invention of the programming language called! USP 
(McCarthy <tx <mL 13E3>. 
iiFjijttijiJfs 4/ JttdJuJar <J«ijrn 

Any large computation should be split Up md Implemented as a coMetHon 01 
small sub-part*, that are as nearly independent ol on* another as the overall task allow*. If a 
process Is net deiianed in this way, a small change |n una place will have consequences in 
many other pFaces. This means that Ihe process 3S a whole- becomes extremely difficult to 
debug or to Mprove, whether by a human desipn#r or in the course of natural evolution, 
because a small change to improve One part h*sto he accompanied! by many simultaneous 
compensating changes elsewrerer 

J: Princfpl* of- hart fflramilnirnl 

The principle of least commitment slates tb*t one should never do somdhnng 
(hat may later have to be undone, and ] believe that il appNes to all situations in which 
performance is Huent, It i= frequently the cats during the eneCutien erf a teeognitron task, that 
there are a number oF possible interpolations of * particular dalum, but 1hat fchirre Is mot yet 
sufficient evidence, to decide between ih&m. In such cases. One should never become 
committed to one of the possibilities promnlurety, because of the damage that Knowledge 
Associated With that possibilily and not wilh Ihe others can subsequent^ do. 

There are !wo ascapas (rom situations In which the principle is about to be 
violated. One is to 'wait and see", hopetul that the rival possibilities can be maintained! 
without causing memory overflow until information becomes available that can select the 
correct interpretation. Marcus [1974) has conjectured that the structure or English syntax Is 
such I bat a wait-and-see- parser never has to wait very long before seeing. Tbe other escape 
is to restructure Ihe problem, by breaking the computation into rnora steps, by increasing the 
vocabulary for esfpfessinf ihe possible chokes, a.id by adding mora diagnostics for deciding 
between rival possibilities. The sheer volume of information rules Out 4 wait-and-see 
approach to early visual processing, so Only the second alternative is a real option (here. My 
expedience has been thai if One has to disobey the principle ol least commitment, One "is 
either doing something wrong, or something very diJFicutt. 

An appticaliOn o( the principle rs frequently actompanhed by a particular stylo 
Of computation called canMramt ntialyti; or /llrcnfftg. We shall meel il later in this article. 
Where several possibilities, tOtnpete For I ha privilege of describing a paiticular datunt, Ihere 
usually evist constraints Or measures Of preference that Operate among Ihem. The ad of 
flooring the poss'-biliFies using Ihet constraints is a d slinclivp style oF compulalion h sOmswhal _ 
reminiscent of relaxation techniques for SOMnf complen problems In structural engineering. 
Conslrainl analysis was lirsl used affectively in a vision program by Watt J { 19721. A neural 
Implement al ion oF essentially lhi* technique was given bv M*tr (1971 secliOn 3.1.?). 
il Priut.iplt t>f graceful dftgrainlian 

The final principle is designed to ensure that wheraver possible, degrading Ihe 






data will rot prevent one from delivering al least some of the answer. It amounts fro ■ 

condition On the continuity of I he relation between deseriplions computed at different si ages 
I ft the proteasing, For example, pI would ba foplijb not to fOOuire that ■ "rough", two- 
dimensional description, of the kind lhal a vision system might compute out of a drawing, 
should enable it lo compute * "rough" threo-dimamional description of what the drawing 
represents. 

Early Fr0Dn\uinjj: comfnning ffcp prtmsi lltetck 
The. primal skelch consists oS a primitive hut rich description of the intensity 
changes that are preaent in ^n image. This description ennvata of a r;«t of a 5 so-r | rctnsj 
expressed in terms oF a vocabulary oi symbol^ and modifiers that are powerful enough to 
capture all of the important information In an intensity array. An eieemple Of such an 
assertion mjjM be 

(SHAOIMD-EDGE (POS]T]QN {33 48) (73 4fJ)) 
(CONTRAST 31} 
(FUZZIHESS 17) 
(ORIENTATION 0)) 
The design of a method for achieving. |his rest? on two primary decisions^ what 
types of intensity change are to be detected, md how expressive ^ the vocabulary in terms 
of which these changes are to be described? 



Onr-^i rjtf ntional intend I j- prafi\et 

In an Empirical s1udy> Herskovitz ft Tjinferd (1970 pp!9, 53, 55} found that the 
most common intensity changes in images oF scenes composed of polyhedral objects were step 
changes, bumps, and roof-shaped profiles. C'jr eoerienca adds some others for more general 
scenes {see figure 2}. The detection of iW-shaped intensity changes requires a sensitivity 
to changes in intensity gradient. The human visual system has long been hnOwn to be 
su native. So ^uth changes {Mactt Bards* Ra1liFf 1-965), but of tho edge-detdcl ion aFguriTbrns 
referred to trt the introduction, Only the BinFord-Horn line-Finder (Horn- 1973) Incorporates a 
sensitivity to the second derivative of intensity. There is no evidence that humans are 
sensitive to higher derivatives. 

A Htompgt^nl edg^-finder therefore r*eeds to be sensitive to discontinuities In 
intensity end in intensity gradient, or (roughly) to measure the Mrs! and second derivalives. of 
intensity everywhere. Approximations to these quantities may converse 'it 'y bo obtained by 
COftvolvpftg the image locally wilh "edge -shaped" and h b a' ■shaped'* masks (see figure £a). This 
follows fr-Om the fact that an edge-shaped mask measures an a ppro * I m el i r) lo the tocat 
Intensity gradient in a particular direction. A bar-mask may be (bought of as composed of 
I wo adjacent edge-mas*;* wilh opposite signs. It there Fore measures approximately the locdt 
thange In Intensity gradient 

This argument defines |he types of intensity change that are lo be detected, 
but it is important to note thai simply making the measurements is not enough. Almost every 
point in eltnOst every natural image gives rise to a non-zero convolution value with almost 
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FIGURE 1. Selecting th? appropnale mask-MZB from which to cnmptita I he description dI an 
Intensity change, the ligure iliu^l r-al b-h- lha torwoluticin ol "edse-shaped"' masks of (hrea sizes 
with difFerent inlsnsMy distributions. The masks are shown to stale- on (he led,, and Ih* 
wFdths of thetr panels pwb 10, 2$. and 60 tmils. The three intensity dislributkiriB are a tlep 
function £»}. 9 function (hat increase* linewly pver 100 units and is conslanl elsEWfiorr- (I..}, 
*nd 4 step- change 10 unit* wide supfritnppsed on the linear ona Cc). Convolutions wilb -each 
of Ihesa di&HribuSions arfr enhibHed Qpposil* each mask. In Id),, the peak height (hat occurs in 
s*c'h convolution hn= hr?rn plaited against mask siie tor each in1«f«ttv distribution; trace 1 
corresponds Id dislribuliafi (a] h Irace 2 10 distribution {b\ and trace 3 to distribution te), The 
selection criterion chooses a masX size if it correspond* to a peak nr to the lelt-hand end ol a 
n?ar p'aleau in the graph (d), So™+ dl*t* Ibutlnns causa two n*ask sizo-s to be -selected. 
Distribution (c) it one of lh»«. The nwsk sizes selacled for it are 10, and • value near 90. 
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FIGURE 2. Clarifying Ihe gr»y-leval! Intensity changes present In *n rm*gs, Exsmpl&s erf the 
*dga- and faar-rTt??k; tj*ist ware used app^aif in fa). The Ibx* clasilllas [bo- possible 
CPnfcg,ur-alians- of peat patterns in adga- #nd bar-mask convolution profiles,, and I his 
elistlfitalion is illustrated by {b) - (f], Edgi-misfc proFilei *ra hflarMd wilb ■ 1, and bar-mash 
profiles (second dsrivalive) wi'h 3 2, The c-ssuss are EDGE (b>, EXTENDED-EDGE <c), EWR 
{Math BiPnd) (d), LINE [a> «nd SHADING-EDGE (0- Intermediate terms era used whan Ihe 
proceiW fails 10 Find sufflcia-nf paahE to rfatarroina the edge type. 
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FIGURE 3- The intensity dialribution eihibiled in (a), whose profits appe*TS ln[b), wj* obi fined 

by illuminating e curved piece nf white paper Fr&rn one or J, arid yiowihg it from above. Ha 

descriplion, computed using an edge-mas* ct pamtl-Vhdth 8 <c}, ind b»F-ma*K» ol pfnpl-widths 

4 (d) and 5 :.■?'', k- as. fnllawst 

EDGE (POSITION BO) (CONTRAST 136) {FUZZ SHARP) 

EDGE (POSITION 212} (C&VTRAST 3) (FUZZ 4} 

EDGE (POSITION £92} (CONTRAST 2) (FUZZ SHARP) 

EDGE (POSITION 435} (CONTRAST -3} (FUZZ 4) •■ 

EDGE (POSITION 444) (CONTRAST 25) (FUZZ 5} 

EDGE (POSITION 461) (CONTRAST 2) (FUZZ 4) 

EDGE (POSITION 490) (CONTRAST 1) (FUZZ 4) 

EXTENDED-EDGE (POSITION 5S2) {CONTRAST -12) {FUZZ 9} 

(Ihe peaks, giving rise ta (hi? edge are marked with arrows} 
EDGE {POSITION 624) (CONTRAST -20) (FUZZ 6} 
EDGE {POSITION 6?&) (CONTRAST 3) (FUZZ 4) 
EDGE! (POSETIQN 6B4) (CONTRAST -4} (FUZl 4) 
SHAOlNG^EDDE (POSITION 570) (CONTRAST -11) {WIDTH &7) 
SHAD] NG-EO&E (POSITION 39 1 ) (CONTRAST 4} (WIDTH 36) 
SHADING-EDGE (POSITION 339) (CONTRAST -8} (WIDTH 73} 
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•very size and orientation oi edge-mask. We therefore have to compute- f*Qm this nasi of 
data some symbol 1hat represents a local pie;e ol edge, and it Is this Symbol Thai wili then 
Stand in correspondence with 4 line segment in an artist's drawing. Fortunately we- cin make 
4 great simplification al this s^age in the analysis, PrOviced that measurements tro made with 
masks OF two or more sites, the positions and siiei 0< the peaks in the measurerntrils. provide 
enough Wermalion to compute Ihe description of the underlying intensity changes, 
Fu.-Ihermo.'c provided that a group of peaks is sufficiently isolated from Other peaJiSj. the 
other pefths may be ^".ored when analyzing 1hat group. 

Th# reason for this is illustrated in figure 1, which shows th* difference 
between edge-ma^k values obtained using masks of three different sizes pn * Step change in 
intensity (la), and on * gradua: change Ub>. The resuMs are analogous to the power spectra 
^! r-c d 'reran! X >v.-.. d "i.l.-p. 5:^ iha-j-s are "vi:-" r;q if ;,y ws\ iy ail sires d m;^k, 
Gradual changes are seen increasingly faintly by ed*e -shaped masks whose dimensions are 
Smaller than the distance over which 1he intensity ehtng*- is liking place. Figure Id sh-pws 
this effect in graphic forun by plotting the ma*: mum (absolute) edge-mask value against Ihe 
mask width. Trace 1 arises from The stop change {figure la}, and truce 2 arises irOm Ihe 
gradual inters*!/ change (figure 3b}. h jQQd estimate ol the spatial anient ("fujaintss") of an 
edge may he mjde by finding the ,mask'si*e »! which the Bdge-mask response starts to 
diminish. Accordingly the foNOwing criterion ls used, 

Sirlcriff»i critfrimr. mask size s is selected al point P in the image whenever (a) masks 
slightly smaller than * grve a.-, appreciably smaller pea*, at P Y and (b? slightly larger masks give 
a peak that i? not appreciably larger. 

For some intensity distributions, more than one m»sk size will satisfy the 
«]*ctiOn criterion. For the fliSTriWiOn uhOwn in figure 1c, Ihe crite*,ori is sahsiied by s - 10 
if*d t - && 10 100 (depending on 3 he algorithm that interprets "appreciably"}, as can be seen 
from irate 3 of figure Id. Such a distribution would give rise lo three assertions, 4 sharp 
negaTive edge close to a sharp pOsitiva one, and a 1-uKy positive edge that encompasses tha 
Olhar two. 

This shows o*e way in which Ihe use of multiple mask sizes is important, but 
there is another reason which is nearly as imporlan'. It is that where 3 fa nt edge- enists in 
the image, if is freOjuenlly impossible to 1ell from a single record which of the peaks ere 
important, and which are due to noise, Matching peaks oblained using different si/m ot mask 
grtatly aids the separation oF signal from noise. 

The- algO/ilhm lo which this leads is similar to the nOn-linear technique 
described by Rosenfeld & Thurslon 11971). The difference lies in Ihe- use- to which the 
algorithm is put. flo-senfetd &• Thurston used it tor detecling 1e*lure boundaries at which th* 
average graylevel change *as small compared with the contrast occurring, within oath 
tenure, "o achieve suctssslul results,, they required 1hal measurements from mask* el aJI 
si*es be available at all points in the irn^rjc f,No!e thai unlifce spatial frequency,, the denser 
the measurements, The more informal ion one has. If measurements are trade at every point 
and sufficient information is available about :he boundaries, ft finite intensity array is 
completely reeove-ible 'r&m its convolution wilh any edge- or bar-shape-d mask lhat is not 
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too large), |n the present theory, texture boundaries ar* detected by other mearu, tnd the % 

algorithm Is used simply to obtain a manure of I ho spatial extent at an intensity change, \ t 

Hence, unlike Roteiffatet A Thurston's application, -th» distance batw»Bn measurou-enl-; t»n ^ 

decrease as. the size of tht mas*, increases without weakening the technique. ' ' 

The process qI computing the description consists then of four apa-rations: (1) 
find and match peaks in the rtieaswe.ments obtained Front the cnn/volutions of the image with ' 
different sizes of roas'K; (2) l elect the relevant peaks using the selectman criterion; (3) 
separate the peata into isolated groupSj and (4> parse the local configuration of peaks into a 
descriptive element. A small number at classes of peak configuration sulfites to sQwt the 
Cases that can actually Occur, and they are illustrated rn (sgur* 2. The ligure shQ#S typical 
combinations of peak patterns thai Occur in the Output? IrOnr edge-mask (upper records] and 
from bar-roes'*, {lower records) ^onvolulid'-s. Examples at the mesfcs that we use appear In 
figure 2a. The descriptor EDEE is used when two peaks of about equal and opposite signs 
occur together in the bar-mask record [2b>. If one bar-mash peak is considerably smaller 
than the other,, (he edge is classified as an EX f t!NDED-EDQE (5cX E-4endrd -edges are towmon 
whcTc * convex boundary is illuminated from one tide- Figure 2d shows an intensity gradient 
edge, And figure 2e corresponds to the pretence, of a thin LINE such as can Occur in the 
highlight from aft object's edge, or a very thin pencil stroke. Finally there are edges that 
begin and end gradually, and extend over 3 relatively lar^e di«l»ncej these are classified as 
SrtADING-EKEs {figure £f>. In addition to descriptor* of edge type, one can measure an 
edge's COrJTfi-ASL POStTION, ORJEWTATION. and FU7ZINE$£. This fast parameter characterises 
1Kb spatial extent of the edge. 

Figure 3 gives an example of an intensity distribution that has been described 
by this process, and the legend explain? which mask convolutions were used. One Of the 

.-1 :; r. i- r " io- ■; h-is h::ni \'':::f.ti Li.iCk io t!-£ c OrVJ-Z III I ic -i prnti!ps, ^nd Ihc arfnw? pdir.t 1fi 11"!: punk-; 

that gave rise to thai pabular assertion. The lew-level vocabulary thai Is used in our 
present system is not intended to be definitive, but some claim ff made. Id the effect that it is 
a good example of the genre, because it rests On the correct measurements* 't bas sufficient 
expressive power to describe most kinds of shading adequately, and the method is simple and 

works reasonably well, 

. 

Extcrttlah i« lieu Jimtmiani 

The method may be entendtd to two dimensions by carrying out the analyst 
simultaneously at several different orientation*. It is preferable to use Orientation-dependent 
measures for making the initial measurements, (or reasons that are illustrated by figure 5. The 
image {5a ) of a chair (1££ points square), whose hatf-tpne image is figure fla and whose 
intensity distribution is shown in figure 5b, has been convolved with "corner-shaped" masKs. 
The results appear in lisiiin::; F::: # d, bul csn the reader confidently distinguish the comers 
from these measurements? The reason Fa? the failure is that the inverse transform to that 
produced by a corner-shaped mask depends critically on the boundary" conditions thai obtain. 
Any method that computes a corner ouerfion Is saying something about this inverse and eO 



FIGLflE J. This flgu**. provides, a high quality reproduction o* t^-it tix HTa&ns dincussed in Che- 
text, a and b wara taken with t eohsi^r^biy modified information International Incorporated 1 
Vidi&secTQr, and the rest were taken with a Tdmalicirt TMC-21G0 vJtficOn camera attached Ed 
a Spjllal Dita Systems. -digitizer [Camera Eye 108). Tha lull dynamic range from btach to- 
while L& represented by 25G tray revet*. Tfw Imajet reproduced here were created by an 
OplrGnitS P1500 Photowriter from Intensity arrays that measured l2fi elements, square, This 
tlie of Intensity array corresponds to viewing a ] irtph tquarr at & feet with I tie Nubian 
retina. The ima^e of I ho period at I he and of this sentence probably covers more than 4Q 1 
retinal receptors. The reader should VreW the images tram a distance or about five feet when 
assessing the per Fa rm ance Qf th* proems, in Ihe interests of clarily, these intensity arrays 
have been displayed In I wo other ways (-where' help FuF). They have been printed on a 
Xerographic printer Usi^g a font of 16 gray leveisj and 1hey have heen disptayEd as a three- 
dimensional graph, in ^hlch lh» z coordinate represents intensity. These disptays appear rn 
the figures. 
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F]GUR£ 5- The image df the chair whose hali-lDne reprehension is given in figur* ia, has 
bean prated in a Lo gray-levil font in (a*. A jhrae-dimeniionjl intari&lly map height - tog 
intensity} appears in (b), Thii image has beer. ee.nwotM'ad with two "cCrntr-masfcs" tn> arvd (d). 
Oeteding corners itvm sgf-h measurements aLflnc- is not an easy task, Thi? illg&Tralas why it i* 
difficult (0 «mp.gte a description o( art Image di redly from me a s uremen C a that are not 
directional 1y selective. 
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FIGURE 6. The frsl step in CDrtipulins tht primal skjich- of 1he image CHAIR is la compute * 
description, of I he arey-level change* at ttth of light Orientations, The results Of doing this 
at four orientations are ah&wn her-fr, The O-riantalion* are arranged clockwise Itqjti the 
verLcal (a), 22.5 degrees to the vertical (fcO, hcnronSal CO, and 45.0 degrees to the horFzofltal. 
The descriptions were obtained by scanning ewe^y other line perpendicular to the ftrlenJatton 
of the minks, Each, division On the axes represents ten image eletrterits. Two s\zt% of bar- 
masK and one cd£e-ir>ss^ ware used. This induded a bar-jr.ask Df panel-widlh 2 and length 
10, irt addition to the masks shown in figure 2a. Each pf the letters in ea?h figure reprftSfthtfi 
an assertion like Jhal givefl in the legend to figure 3. The axes are marked al multiples of 10 
picture, elements. 
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must take enough information into account at each point lo satisfy the dependence Ofl 
boundary conditions. This extra information may be suppliad by looking at the results ol thi? 
corner mask a1 neighboring points or by looting al lha results 0* some other measurement 
taken in parallel] (hp important paint it that I ho compulation is- not a trivial on* and it has I* 
tab* these extra factors Into account. 

The way to avoid the difficulty ii lo make the masks so Orientation-dependent 
that they push I ha problem bath into one-dimension. To lake account OF the boundary 
conditions associated with adee- Or bar-shaped mas'-.s, o~e needs lo corpara quantilies in 
only two direction^ rather than in alt directions round * point. This makes it inherently 
Simpler lo compute the primal sketch from measures obtained" with such masks, and it is why 
W* Use them, Notice that (his argument is Independent of presumed properties Of the image. 
It Is not impossible to compute the primaf sketch from mee*uTc5 that are not directional ly 
selective, bul a persuasive, case- would have to be made for choosing them. 

Combining Grianitiiinn-iiepanienl faacriptlon* 

The- number Ol different orientations it which Iha analysis needs lo be carried 
Oul is tilted by I ho first stage at which (be total assertion* are glued together. The sensitivity 
Of the masks is not so important., as we Can sea by calculating their orientation tuning curve*. 
The: ratio Of panel-length to panel-width in the masks that we use Is about 5:1 (Figure 2a> P If 
such a mask is rotated about a step-change (d&o,. the angular distance between I he maximum 
response and L/|2 of the maximum is about 35 dejreesi so their natural tuning curves ere 
Very broad. 

Much more critical i$ the flexibility with which individual elements are combined 
to form assertions about small edga-segmanls. This process is the beginning of the grouping 
phenomena that seem tp be centraf to early visual processing, and designing il has been |h<* 
main stumbling block in writing competent edge-detectors. One of tha-best of them {Horn 
19?3) requires that lines should have length JO beiore ftvidomce- of their exislence is accepted 
as compelling. II was designed this way because IF substantially shorter elements are 
accepted, a Fargo amount of "noise" appears in; the rjytpul- Blobs and blotches, common irt 
textured Images, cFlen give rise to elements that are thorler than this, so Ways have lo be 
found of dealing wilrt thp nolsir 

Figure G gives some examples of (ho data with which One has to deal This 
thews the primary analysis OF CHAIR at (he vertical (6a), 22.5 degrees lo the vertical (6h) h 
horizontal (6c} and 45.0 degrees tq the horizontal {6d). For each tnasfc Orientation, the image 
has been scanned along every plher tin* perpendicular to Ihe mask, and every poinl along 
each scan Una was considered- We have to tea a line scan because Ihe smallest masks used 
were so Finy. Each symbol E in Figure G rfiprc-sents an assertion lifco thai given in the legend 
to figure 3. Wilh Ibis scan, it is sufficient to use a primary grouping that Operates 
indo reticently along eight crien'ations 223 degrees apart. The grouping requires that Ihe 
types of adjacent primary assertions (represented by the cTsS should roughly match, (fr>r 
evample WUt malches EKTENDE0-EOGE bul not LENE>, and that Ihe relative- positions of Hn? 
two asscrlions should be appropriate. Edges whose orientations lie midway between two. 



scannihg direclions are sometimes found by both neighboring scans h which shows thai eight 
Orientation* are sufficient at this stage. Soma technical problems have to be dealt with 
before |his process will work successfully, but they «r« loo minor to be treated here (see 
Marr 1976). 

By the time the primitive elements have been assombted inlc straight edga- 
fiBgments T evident* that Ihey originated from eight scan* has aliped evapOralad. II Is 
advisable not to qLtamlize- (ha orient atlom of th B g,1 UB d edge -segments, bec^us* dorn^ 50 can 
cause confusion be1ween a straight line and one conlaining- many small kinks. IT is however 
possible to devise a discrete representation system for the segments, In *hH c h a segment el a 
given orientation is represenled by linear interpolation between- fixed, standard Orienlalions. 
Most setiemis ol this sort requke soma mulual "inhibition" between Carriers of neighboring 
Cflmpor^enli m Order Iriat the contrast of the intermediate edge should be represented linearly 
(see Warr 197&), Such InhibFtinm arise* for purely representational reasons. The main ferco 
behind the initial gluing process is the consistency- relations betwean nearby primitive 
elements. 

Nevertheless (hero turns out lo be a need for competition between scans at 
different QrienlatJonSj (ha| allies for reasons which are intrinsic to the analysis not jus! IrOm 
a representational convenience. The surprising part is thai the competition is required not 
between segments at nearly adjacent Orientations, but between ones that are rloarJy 

perpenditutar. 

Figure 7 illustrates the problem* thai arise. The image of s rod tligure 1b r 
figures 7a and 7b) was Hirst operated on at eigbt orientations wilh Ihe process described in 
the lesl section. Nr«t h Ihcse local asserlions have been glued along directions nearly paralTel 
to the masks from which they we*e Oblainad, Each edge-segment En figures 7c A d 
represents several Ot the E7s of the typ* shown in Figure G, and the dal abase records all of 
the parameters associated with each segment. Quantities lihe the edge type, contrast and 
fuzzinesi are specified at intervals along (he longer segments, since Ihey tan change along 
them. The longer segments should properly be regarded as a sequence of collinear short 
segments. In a full vision system, discontinuity o[ binocular disparity or motion along such 
an edge could still prevent 1he assembly ol its subsegments injo a single unit. 

The Feature of the data lhat is relevant 1o inler-orientation competiUorV Is the 
abundance oF ihort segments roughly perpendpcular lo the primary edge (Figure 7c>, These 
ere caused by a combination of local noise, the image tesselalicn, and ^regularities in the 
imagSr They OCCUr in every image that wh have processed. In dealing wllh them, one cannot 
dismiss in a Cavalier manner all very short segments: tiny 'blobs* In Ihe image also give rise 
let them, *$ Can be seen From the same image at coordinate (73, 75), But a ^sm^ir clement 
like this can be ignored i' '!a) it crosses, a "long" elemenl, and (b) its contrast Is less than that 
of the item i| crosses, Figure 7d shows Ihe results or removing small noise elements using 
Ihis criterion. Occasionally, (wo smatl noisy segments can accidentally become aligned, 
creating a longer noisy segment. These are eliminated in I ho same way. 

The crosses in |h& figure (some limes related to avoid alignment wilh Ihe edge 
segment to which they are attached') signify that the contrast of a direded segment changes 



FIGURE 7, The second step in compiling (he primal skelth. Afler Ihp imtfrntlty changes have 
been destrrbed independently at each of 8 ftriantfllions, and after IkaI Flnoar assembly of 
Ihese tfcscripliona has taken place, (he eight descriptions are combined, Thrs process b 
Illustrated here For a perlieularly simple image, q( the rfld whose hall-tone representation 
appears as Figure 4b. The printed version of thr> kage appears « (*), irtd (tie intensity map 
as <h). The reacts ire combined lo give the dala shown in (c). Each tiny line segment 
corresponds 10 two or more individual assertions (like those illtrel rated in f^ure 6), and a 
summary ef the information associated wl|h tacti of (hose assertions (as in- figure 31 is made 
»t inter.' lis along each segment. Only information about Ihe pos-iliem Ol the seEMfrflts and 
about Ihi precursors of lerminalinn asssrtrOns (shown as crosses) can tonvsnlsnilly be 
repr-tserled irv a diagram; this can give a misleading impression of some items |t| the primal 
sketch. For example, many of Ihe lines Oh the curved part of Ihs rod On the left of the image 
arise from shading, edges. They describe (hr> gradual intensity changes that lake place I hero, 
and should not be [houaht Of in Iha same way as sharper ed^as, £he,t noise elimination then 
lakes place to gfye (d), which gives a fair Ide? el Ihe messines-s ol the uninterpreted primal 
sfceteh, 
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FIGURE S. The dif ier*n« between Ihe primal stetch and a feature -point arrtfy is brought Out 
by an intensity d i S t f ibul i rl lik,a (»1. A measurement taker *ilh j targe mash (b> could 
■J.C- 1" I ' J " :- J teat .n:-::"; n!, h, ' t wdulc nil be -ici i' "'le CZ-l-ipul^'icn o! Kt: ur n,i -iM!t;:h. 

This, it because 1he sharp <cntf*$T changes -Op&ralij through 1he selection criterion to force 
th* de^Cr ijjTi-3n TO be computed from small maj-k; like EJtO;c shown in {c). The (ina) description 
is, of tv^o b'lobs.j whJth tfeFine "places* (d). 



rapidly at I hat point, possibly becoming zeto. They are the- precursors of assertions about 

the presence af terminations, and may be thought ol as idurililying, lhe eifatt position ol a 
termination iF me exists near (hero, The problems (hatarisB Fn obtaining them are dealt with 
elsewhere (Marr 1976), 

One olh*i" Item of nolo in computing the printal sketch is the question ol 
detecting local, sroalf blebs. Figure 7d al coordinate (73> 75! shows how they eppear, and irt 
fact wo make small blobs a primitive element Q.T the primal sketch, together with their 
associated COnlrast, and the sizes and orientations ol their major and minor ares. The 
defining criterion for when am image item is sms»l enough to be called! a blub or a line it that 
il should ba indivisible. This occurs wHon One. Of Ms dimensions is comparable with the 
resolution of I ha analysis oF the imago at that point {about 5 image etemenls in length}. 
Finding blobs from the glued assertion* depends a WMtl amount on elegant programming, end 
e large anourt on brute Force {Marr 1976). 

■ 
Soma conj*j[ien£r» 

As a model of the inform stmn-processlng thai is performed in Area 17 of the 
monkey, these ideas have One main consequence whose disproof would 1 destroy She theory. It 
is that the direct Output of a linear simple ceFI is nol availabfa centrally. lis signal is used la 
create en assertion about the presence of an edge h and 1 I ha! assertion is what is available. 
Creating the assertion I; an act at computation - „ simple One h since it Involves liltte more 
than peak matching^ applying the selection criterion, and the classification cvla peak 
configuration; bul il is an act cif computaiion nonetheless. The main poinl is thai this, has lo 
go on h and one should therefore be able to find experimental evidence of ft, 

A consequence of this view is HI usl rated in figure 8, Suppose thai art image 
contains two small dos* blob?. These blobs give rise to measurements by a number of sizes 
of mask - soma small ones represented by the tiny Irna segment;, amd some? lar^H ones, like? 
the One lhat is illuslraled. One's a priori inclination mighl be thai a large "line -detector 4 
would fire, and thai this would hsve soirething lo do wil'i seeing I he two blobs. Thii view 
amounts to supposing thai simple cells wrile diredly inlo a feature -point array that is. freely 
available to subsequent processes. But it cur theory is correct, although the large "simple 
cell" may indeed fi r e, its measurement will no! be used to compute Ihe description ol the Iwo 
blobs because lheir sharp boundaries cause the associated intensity change to be described 
from peaks in Ihe small masks (by the selection criterion}. The selection criterion -(figure Id) 
will cause the description to be computet* from the smaller mink* unless the blobs ere 
severely defocussed, t 

Ano!her interesting point is that we I ail lo "see" Abraham Lincoln in L. D. 
Harmon's coarsely sampled and quantized image of him {reproduced by JutesC 1 971 p^ II). If 
measurements f*om linear simple cells we*e freely available 1o later processes, and if we 
were able to select them, by receptive-field size, we would presumably be able to interpret 
that image without physically dclocUssing It. According to Ihe present theory tha mask sice 
Used to compute Ihe description is chosen by tNr selection criterion. This it consistent with 
Harmon A Jules-t's (1973) finding lhat noise bands spectrally adjacent to a piclure's spectrum 



are most efFectiva at suppressing recognition, since these have most effect On mask response 
amplitudes near the important mask sires. Furthermore, because two peaks rn the graph W 
WOUFd Cfruse the atgGrrthm to treat* two- local edge a-ssertiOns (with deferent degrees of 
fUTZlness) 1 , it also explains why removal Of only the middle spatial Frequencies from such en 
Image loava-s a recognisable Image of Lincoln behind 3 visible graticule (figure Id or Harmon A 
Jule-u 1973). 

The structure tiF the raw primal sketch at it is First delivered from 1 the image 
may be summarized as fnFlbwst 

PS I, Tbe primary visual processor delivers * syroboHc description ol the irifcifint^ty changes 
present in an image, This description UiM the Followmg primitives to describe fntensily 
changes; 

<i) Various typ« or edge; 

(ii) LINEs.or thin BAFta 

(iii) QlOBs 
The Itorns (i} and (ill have been assembfed into straight segments, md short noise elimination 
ha* occurred. PS2, The following items are computed *nd bound |o each filament of the 
description. 

(t) ORIENTATION - nf an edge h line Or ban of the major *^la ol 
a blob or 3 group. 

(ii) SIZE - lenglh and width If both are deFined, diameter if 
major find minor axes are equal or undefined, 

(ili) Local C0NTHA5T, 

Ov) POSITION, 

(tfj TERMINATION POtrfTS. 

WtiBt irttvinff lell tij 

The second step of the argument depends on Our ability to interpret simple 
pencil drawings that lack seniaTitk content, By examining sui Labia a r *wp\n s, wa can infer wi!h 
some confidence t*iat certain symbolic grouping operalions musl e*lst In our visual systems. 
■In ordar to establish the principle thai grouping processes sometimes exist, let us. first lafce 
an extreme case. When une looks at figure 9a, there can he little doubl thai sOrm* proee« is 
creating a circular contour joining the Inntr ends of the radial lines, the path of thfs contour 
is. marked by an apparent change in brightness, less than hut comparable to that observed in 
the Kanizsa triangle illusion. 

|n deciding how (his comes ihout, we may distinguish three rival theories. (1) 
A local process Operates to join neighbouring ends of Une*-, (2) The inner contour is 
constructed by EOme mechanism lhat relies upon the piatlng of an edge-shaped mask In the 
position shown in figure 9b- 0> The radial lints Cause a "Geslalt" of a "sun" to be instantiated 
for describing the situation. Thj* very high-level concept then imposes the contour OH the 

figure. 

If (2) ware correct, it would disprove the pr lm aF sketch theory. since t\ requFres 
that b mask output value be identified In a simplistic way wSth an assertion about a contour. 



Illustration 9c disproves (2) however, because [lie- contour remain? visible de-spile th* 
presence oF an intensity distribution that would remove fir regale the masH values On which 
£2) depends. If £3) *cre correct, il would* imply Cn| at downward-flowing information has a 
gToat influence on early processing - a view which runs counter to Ihe second main thru*! of 
:!:■• present Iheafy. Theory Q) assurres a senti1iv*ty ta radial linos. The lines in figure 9d 
■re however also radial, and this Is nol iimnedialely obvious. 

The possibility remain) thai tome combination of U) and (3) Is what reatly 
governs pur perception o* the Figure, Thi Important point is Thai the initial acquisition of the 
"sun" concept probably relies on the- mechanisms In {!). Once accessed, this GtstaU may 
influence I he computations to the orient of deciding thai Ihe sun part is I he foreground snd fat 
thereP-ore slightly brighter, bul such an influence determines only one bl| of the fir»al 
description, Frgure $e makes i1 unlikely thai the particular "sun" gestell has even this effect, 
since it provide* a simitar example in which *ends-of4hlngs~ form * perceptually "brighter" 
obscuring region. It is more likely thai Ibe relBlive brightness reflects a (cCntext-sansilrve) 
JSSUmptifin about Ihe si^n pf foreground -background cuntrast. 

These examples establish lhat abstractEy defined places in in image can be 
assembled into contours that have a de Finite perceptual esrlstencfl, and that lh=s Operation 
probably precedes Ihe access and appircalion ol higher-level concepts to Ihe image. From a 
computational point of view, i1 is natural to I hint pf the phenomena as occurring in two steps. 
Firstly, certain thmgs in drawings can causa *plae*-Tokai*s" to be defined in some abstract 
sense. Secondly, piace-lokens sfi delined can be grouped In various Ways. 

In how many ways nay place-tokens, be denned,. and In what ways may they be 
grouped? We see Irom figure 10 a lhat a short line may cleFine a place-foli^n, and from fig urn 
10b that a small blob may also do so. The end of a Mne lhat is nol loo short, or of a blob wi|h 
Feng major a*is and short minor wh may aPsO define 1 a pFace-lfiken. {The imprecision of the 
boundary between '"too Fong" and "Foo short" is incOnsequenFialj because near It, both 
definitions usually load lo t^e sire groups. The boundary read's Fa he in the region oP 0,5 lo 
1 degreus of a^c at human loveel resolution.) Small :ol!*ctioni oF bffibs (figure JCt) or of 
lines {figure IOd) may also be treated as a unit. Because OF (he variety of ways in which thl*. 
may be done, {t igure i n e) it is probably ilfiplfjmented by the rule that a group oF place-tokens 
may also delina a place-token, rather than by differed rules lor groups or blobs, groups ol 
lines, b roups half of Mob* and half oF lines and so forth. Hence although pFace-lokens can be 
described and to some extent selected by properties 01 ilems al thai place in the image, Ihe 
grouping processes themselves read place -tokens and are insensitive Id the particular way a 
place-token was obtained. The notion oF a placrt-tohen is a p.ood rHamplrj of the principle of 
explicit naming, and the separation of the way in which a pi a ie -token Is t?e! inert From the Way 
In which it is grouped illusl rales the principle ol modular design. 

The recursive character ol Ihe deFinMion o1 a plate-loVen leads one lo expect 
lhat I he grouping processes responsible for I hem read and write inlo Ihe same slo-rag*. 
Otherwise, one would have 1p mahtain many copies of 1he storage and grouping processes, 
Instead oF just O^e. If Only one copy is kept, Iwo organisational rules must be observed. 
Firstly, wKpievcc a set of plare-lfihens is grouped lo form a new One, Only Ihe new token is 



FIGURE 9. Tdt illusory cQniGur in fa) is gorewhal similar lo the K*ni/sa triangle H tannol be 
duo to a sirnpl* cell in configuration (b)v b-ftcau-SB the conlour if still visible in Ct)L It cannol be 
due to a ge>ts1t of Iha sun induced b/ radial linasj because |ha lfn*s In {e) are radiat, yet this 
is not readily apparent. A jfmilir Illusion is presen! in fdS SU£e,8Eting that 1Kb apparent 
brighJness of the inner disc, reflect* 4 default assumption about I QtefcrCuTid -background 
COnlra&t, ralher Ulan any high level influence. The ShB&ry Btlributes Ihe contokn" lo l«aP 
protBisos thai join nearby ends of lino*. Such processes ars mechanisms of tcinslrucNoni 
rather (Kan mechanisms ol detetlian. 
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FIGURE 1Q, Plato-tokens may be de Fired In an ima&ain several ways-, and may I hen be 
BBftregated by certain standard lechniques. Small lines {a} or Wpbs fb) may define a placs- 
lohan. Sn may smatl corieclions. oF plates £e..*nd d). Ttie definition and the groupinjof pl*ee- 
lehflfis may be regarded m independent process**, bicause grouping does not depend on the 
way Ihe ptace-tntoens were dilined. This is shown by Ce^ in vUzh avery subg/Oup is delmed 
differently, yet I he *ol!irip(ifi|y of all cl them is immediately apparent, information tuch as 
orfitfitition may be bound in a pben-tcken because it was Intrinsic to (tic element Ihat gave 
rise to it. 5ucb infarmalion may he Utad lo help grouping. 
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subsequently vwble to She e rouj>ing pfOCcssei; and secondly, th*r* is a priority tystem lhat 
Operates *mong c-ompeti.^ pfoec» B s such thai (for en B m P r B } vary local groups usually laVe 
precedent. Interaction between rivaF ratal groupings is often necessary to #rrlve at ■ 
grouping salisfactory to them all In figure LQ Bp the teal groups are formed before their 
Organisation mtp « |jp«. Grouping prdwsts ar* senses to orientation, intensity (lightness}, 
fuzz mats, and various measures of th» size of an Item in the Image, » will as lo spatial 
proxlmrly *nd coltrnaarity. For example, Orientation Information may or way not be present, 
ClOa, 10b, Jl*. Ub>, and If present, it may (Ha, lib) or may not (ICa) t™ used Indeed 
diets- two situation* can Occur In the same figure (llc>. Combiner* of spatFal proximity and 
Of similar Orient s.tFp« are Often important. 

Wt see from these example that placeman* can be grouped into regions . 
directly, or into curvilinear assembly that deFine rrgiftna by acting as their boundaries. The 
Gestalt psychologists were aware of these grouping phenomena {Werthermer 19Z3J. In 
addition to lh* region-defying facilities Just mentioned, if the number of places involved Is 
vary small (lets than 5 say), I he places may fa™ a standard, named configuration (sea figure 
He-h) which is e-vFdenfly described rtSatrva to an axis which * imposed on Iho hgura, and 
what* default ^alua is the vortical. 

Separating figure ani ground 
Before tha digressFon of \h B Fasl section, we' had reached Iha point of defining 
the raw prrmal sketch, and of Showing ho* to compute most Of the quantities in il, Wb also 
examined (he primal sketch of a very straight ferret Fmage, of a rod. The primal sHetch i» 
raroly as simple- as that, however. Figures J 3, lfl, 19 an d 2\ contain examples of the primal 
*kekhes oF more c&mple* images, and as One night expect, they arc rn general l^rge and 
unwieldy collections of data. Furlhermo-e, it i s diTFkiril to see how the eempFaBity of It* 
primal sketch could be an artifact of Our particular choice of priniitiwBit images realty are 
temple* In Ihrs way. 

Tha unwieldy nature of the primal sketch creale* what appears to be Ihe main 
task of the next stage of visual information processing: how do *o select regions lhat iruuFd 
ba treated as unit farms by subsequent descriptive presses; , n d can this, bo done withoul 
complex FntoracliOns b**we*n Ihe primal Sk&kh and hypotheses about Ihe nature of the form* 
lhat are being, targeted? In perceplual terms, Ihe computational problem that we mu 5 | now 
address corresponds to distinguish^ between Fi^ur* an d ground, and it is strongly related to 
Ihe problem of tenure vision (Julesz 1971 e.g. pp 106 ff). In neuraphyskslogicaf terms, Sf 
area 17 roughly speaking cO^Dutei Ihe primar shetch, w C come now to the problem (hat the 
next stage must solve. 

We have now reached Ihe core of (he firs! part of Ihe Ihepry.. We saw In lha 
last section lhat certain computational facilities enisl and are deployed during our reading of 
certain kinds pi dVawinigs. It is of course possible lhal their existence Is no more than 4 
happy accident, '*hich fortuitously allows us to interpret Ihe idle StHb-ci3ings pf the artistically 
gifted, Tbe present theory was howoyar founded on the Observation 1hat drawings and 
Images appear surprisingly similar. It taKfrs the view that Ihe processes exhibited by thw 



drawings of figures 9, 10 and 11 are nnt empty o*ann;pleat The ability to perceive the 
envelops Of a Ires, a row of bushes, or even the bor^ir of a grass lawn tan depend on such 
processes, ar>tl they era pari of the reason why compulse vision has had such problem* 
finding object boundaries in the p»t, fl cfltrrai' ajim^Enhi p/ jAii irWj* » lW thtn# grouping 
pnSr.r.i.ipt <tr« OUOLiafjicr pnrriniJy t«a«jff ttaj ara Jtrtrini In htlp inltn>ril the pwintnl tkvlf.h; 
and farlhitrmon ln«l them lymbviie pntnttK iefclW uilh firH-oritf AttttimihalioiiM, 
acting rpctt-nit-tly an Ifce dttttripiion in |V ^Kiirna-I Jncitn. or* jj/ZiriftMii [p ocrr>»nl /ar mait 
e/ In* ron^rr o/ fnn*-a,UentiviT viniott a/ irnicfc in* er# espsbJ^ ufcfrEn iAa cfaju of irjtngtn la 
liiftiVfc (Aii ariirln it rffurrieifrf, In other words r thh ci<tr.ictiOn of forms, and associated 
" 1 textUre' , di5.crimin*lions are actually implemented by first order discriminations, together with 
s sn>all number of grouping opinions, reling an Ins primal sketeb of the linage. We now 
study fn mora detail the grouping. operaTion? On which (he second pa.r( dF the- Int-ory rfnprjrHfs; 

CnSnplttf lr.rhniau.irr 

The purpose of the grouping technlaUes Outlined bare is therefore to partition 
lhe> primal sKetcb into unil lorrns, In a way thai is useful for subsequent recognition, Th& ' 
important ajuestiOn concerns the BKTent to which hypotheses aboul the nature of a Idrm need 
lo interact with the processes Itiat entract it. The hjtuc h One ol degree, not principle, since 
Wu shall show that some- downward -Flowing Information may be necessary to complete 
Mgmenlation. The demands of speed and fluency make It desirable to minimize these 
downward inf luences, and our main conclusion is that for most images, such Influences aFfect 
Only a smalt numW of the decisions, Taken during grouping. 

This most important guideline for the design of grouping techniques is the 
principle- of least commitment. According to (hi* principle, each step is irreversible. Hence 
Only groupings thai are reasonably certain may be made. Thus forces One to decompose the 
overall process into several steps, and to t»^* idvarttag* of as many cues as possible lo help 
in the decisions that are made at each step. 

Curvilinear aggrcgntinn 

We deFina curvilinear aggregation to mean the assembly oF place-tokens [h;tt 
contain an orientation into a g'OUp thai preserves it. This type of aggregation Is on*- 
dimensional rather than two, a'd the discovery and use d1 the appropriate local orientation is 
central, to It. We snail sea that one-dimensional grouping process*-* a'* by far the most 
Important k'nd, Two-dimensional grouping seems to be necessary only loteslFvj larger regions 
(hat tra characterised by a texture predicate a\>t> best found by computing I heir boundaries. 

Inlotmation that determines whether two items should bo grouped "CO mes 
initially from their primal sXetch parameters and spatial efis positrons. The primal sketch 
parameters are orientation, contrast, type (EDGE, L[NE etc.}, and fuzziness, Spatial information 
Includes the distance between the nearest parts oF the lwp items, and the relationship 
between the orientations *ss.OClat*d with (he items and the orientation oF lhe line joinmg. their 
nearest parts. 

Because Of the principle of least commitment, the first stage of grouping 



FIGURE 11. {&} and {b) giv^ c^mpl** of jroupings in whicb Orientation l* tmportant. In (c)., 
orifrntition ■( iniportanl for corrstruttlnE lha square, bul nol (or perttfMne the collinaarily of 
the rctaJfrd TV at-rflss th* trkHIn Informal ion about similarity c( flrifrntalion is used il il can 
be, hut It Is not ellsailMsua.lF il eannol ba. (d) ^hows hew th» fir I* nl alien of a small aggregate 
can be used'to form a larger aggra-gjle-. Evident liha Ihis iugftests thai the results of these. 
p-fimiSCy sggregatsOn processes wv written Into lha sane storage as the prim*3 skeltb. {b} lo 
(h> g,lv« seme example* of "standard confiflui ations" th-al wo have found It yjelul to recognize. 
TKa reader will probably perceive then* relative to a verlkal aisii. The VEE shown in (h) Is 
used in figure 2ltf. 
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FIGURE ISr Examples in whith kemi-local and jloflbif ccns-lr ainta on influence local measures 
Of P**ference during jg^rBgation. (a} *henws * set of piate-lokena, and flb) illuaJratea, I ha 
possible pairwise aroujjing& that toed neighborhood analysis pen-mils. The situalion after tha 
lirtl pas* is shown In (e}. Informal ia« fram trw* pas; m?ka& (d! I he prelerred link on I ha 
Second past, ]rt (e} N IhB linJvs bfrt'*#Bn 1 & 2, and between 1 £ 3 ar<? «rv n'u at i:d as equally 
desirable on purely local grounds. The crvijrall closure property creates i preference for |h* 
Unfc tbat uses, 2. In (he primal sketch, I he a^inir.y belween two el e men I s is evaluated 
timultaneoysly along several dimension Considerations such as Ihete can Dften cause a 
particular grouping (o emerge as clearly pi-dforahla to any others. 
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FIGURE 13. The image PLAPCT, whose Mi-tone represenlatifln. appears ill fl(Uf* Ac, has been 
prlrnte-d in (a). The actual inlensMy values lhat actor within- the supof ir*p9sft(t red angle have 
b*en tst Out in taftle I. The spalial information Ifom the primal sKalch of this irnajJe Is giver* 
in (b), Typicflt segments that ari*» Tram [he first twfi slages m tur^|ir*ar *gereg.a:ion appear 
In tc) and- W). Tte primal Sk*fech does no| tonlain quits eno-ugh mlorrrtatrar* to separate the 
(wo leave*, *ndl th* aggregation technique* deliver the farm lit). They have however almSsI 
succeeded in the reparation. If or-? piece til informal ion rs added (lhal segment 1 does not 
match segment 2) a the aggregation rOulinss can s*par*U <e) inlo (F} and (g). 
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TABLE 1. T^n t&p ta*jle ShsiJS tha Intensity value? fo** a> S»9 I I aictlori 
of the- image PLANT fsee fi^g^e 12) ► Thii Icuar tibia ftivaE the 
values of e [I ge- aas k convolutions over the »?me region,. Qr\ | ^ rcai di^a I 
■decay from the edge above this region i? measurable. Nq ■gep.ci'af -purpose a-rfg Q -* " ndsr 
COuld discern the edge of the nearer leaf in this part af the image. 



X- 34 35 36 37 38 33 43 41 42 43 44 45 4G 47 45 49 
Y 

58 171 169 167 1G7 1&6 IBS Its 164 1*7 171 17 1 174 174 175 173 171 

57 IBS IBS 165 167 1SS 167 187 1G5 1&3 168 174 176 175 175 175 172 

5-6 16S 167 167 165 i&S IBS 1&7 1G7 1G£ 173 17a 177 176 174 174 173 

&S 155 l&B 165 169 157 15& i&7 165 166 175 177 177 175 175 172 171 

54 163 170 167 169 163 IBS 163 1GB 172 169 174 173 175 17S 173 173 

53 171 169 176 163 IBS 168 163 1GS 1&5 178 175 173 175 177 178 176 

52 17? ■?■ :?;: id. :;■■ l-.j .;..■■ _.:: i :-' j in :■■..• 177 -4 ite 178 176 

SI 172 174 171 175 155 165 157 168 172 172 172 177 173 172 175 175 

5-0 171 167 17G 1G9 176 1G9 165 1G9 171 172 174 174 173 173 174 178 

49 174 172 173 173 173 174 171 171 172 174 172 17? 17? 155 173 173 

46 173 173 173 17G 178 17? 171 174 174 173 175 ITS 175 173 173 171 

47 173 175 178 173 173 171 171 175 175 177 178 175 174 173 175 173 
46" 178 175 174 163 173 175 177 175 177 177 174 17$ 176 177 177 174 
4S 173 175 173 174 172 173 174 175 174 171 173 174 175 174 172 171 
44 177 174 175 173 172 171 172 176 172 173 172 172 173 176 170 175 
43 173 171 174 1GS 17B 172 173 173 173 174 171 174 175 173 174 174 
42 175 173 171 172 170 171 176 175 173 172 174 ITS 175 175 175 172 
41 lfil 179 177 172 170 170 169 179 ITS 174 175 174 172 175 174 175 
4G 186 184 179 178 176 17E 176 174 172 17S 172 174 173 172 174 173 
33 135 131 1&5 1SS 1£5 1S2, 1S3 177 ITS 175 174 ITS 175 174 176 176 
38 230 199 197 1S3 196 187 185 186 176 175 1«0 177 175 175 176 177 

37 ?;\7 78? :':,-:. :-v:. : ia= i:-:^ ie.- :b3 i/b :/\i iw m jx ;?:, i?g 173 



34 35 3G 37 38 3S 46 41 4? 43 44 45 46 47 43 43 
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comfciirtti two elements only II (hey match rn almost all respecls, are very dose to one 
another, and if there ire no olher candidates, tlwa. lypically reduces th*. number of groupable 
elerrents to about * lhi#d at the number presenl Ln |ha raw primal nMch. The second stage 
tan (hen ma'ke use of eat?* information given by the flrsl. Solatium the only extra clues 
art> that some segment are now quite Feng {more lhan 20 irripgij elements), Such segments 
atnnjst certainly have some physical importance, and hentft in the second si age it is safe to 
c.omhir>e |wo such elerenls even if thsy l»il to malch on some par a-rnelers, prqvidecf that 
Ihere ere no other reasonable candidates in lha vlcl™|y, tr> some situations, the Nrst stage, 
will actually have introduced new information which can Ihen be L/sed by the second stags. 
For example, figure 12a shows a sat of places (hat are to be Aggregated, and lb* passible 
links between nearby plates- are shown dotled in figure I 2b, Thft first slag* of aggregation 
Inserts the unambiguous segments (I2cy. By the second slage, an orlentalian parameter is 
present, Bn-d lhis h tog.e|he' with Ihe equal spacing of lhe cOlfinear lokens> makea the- grouping 
.shown 111 figure 12d the pielerred one. 

Some result* of these two- grouping processes are illustrated by Ihe analysis of 
the image PLANT,, which is exhibited because It raises scleral points of interest. Figure I3a 
gives the printed image whose halftone representation appears In ligure 4t, and 13b shows 
Its primal sketch, Figures 13c and 13d sh-Ow typical segments obtained by Ihe above 
processes. No" ice the ragged nature of 13d; this is a common feature Of the high resolution 
analysis of indistinct object b own d at I es. The total Orientation of Ihe raw primal sketch 
elements is, preserved only roughly here. 

Having exhausted all those situations in which aggregation takes place more or 
less by default, we I urn now (o the other technique that characterizes an application of the 
principle Of feast commitment, .--im-ly the reaction of relatively unlikety pOssibiFities. The 
me' hod is lo set up a node for each of Ihe ends pf the segments that Were delivered by the 
proceeding processes^ and to associate with each node a list al the nodes- that could possibly 
match Ih's one. tehee how this presupposes lhat each see«*nt-end cm be assigned ah 
inlemat name (principle of explicit Mming). Each of the po-ss ble -natches is then evaluated 
independently el org severs! dimensions, and possibilities that are graded relatively poorly by 
several melhods of evaluation, and Well by nore, are shuck Oul, 

Chur present implementation assesses the possible choices Using measures of 
relative contrast^ orientation, alignment or misalignment, dUtance, edg? type, fuiimess, 
whelher an item acls as a good intermediary belween two segments thai match very Well, and 
whether a closed form would be created by thoosirg a particular segment,. The Idea behind 
this is straightforward. It has long been known lo the Ceslall psychologists that in a line 
drawing, each Ol these criteria can cause elemenls to be grouped together in a "preferred" 
way (iVerlheimar 1923). In Ihe much richer environment oF Ihe primal sketch, there is 
frequently enough information available lo apply all of these criteria simultaneously. If most 
Or all of them agree in selecting a particular grouping, one can be certain enough ol lis 
correctness to select lhal grouping irrevocably. There is nothing special abOul Ihe way in 
which |he preferences at th* ttirTercnl methods are combined: IF *n obvious choice exists, ill 
is taken, and any theory would select if. If lha choice is not obvious one needs additional 






FIGURE 14. Atjoul two people in Ihre* fail tn" perceive the Original of \hn idisg.* correctly (ho 
first time. The fa-l^jro 1$ aosed by the accidenlal alignment of Ihft subject's fore-Finger and 
nose. This laiFura sho^s thai ^iTple local proteases am nporJant cJurirg the analysis of an 
image, and that dsllvftiy by them ot an incorrect grouping is not ? normal event. TNs l& {QPd 
evidence agalnsl (he hypothesis that Barfy visual processing is de^ignsc) around # faiFure- 
driven control structure. The fatt thai one does not makn |hp ;uni; mrsljlm a srcQrtd time 
shows lhait soma downward- J lowing intormallon can aTFicl es^ly crf-neesslflg. Only i small 
■mount is required tn prevent recurrent* oF the *rrOr. 
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informal ion., and 3 theory that happened to nek..- I he correct choice on marginal grounds irt 
one image would fait in many olhars. The irttorc-siing point is art empirical one - that these 
crude select Ian criteria ?re very effective. They enable One lo solve simple ima£*s 
completely, «id almost lo sclve *sven quile diFricull ones. Applying the criteria Ft relalively 
inexpensive, because lha number nF segments tnat exist al Ihis point I* much l^s* lhan the 
number of ttetris. in lha raw priral skelsk This type of filler analysis, has the added attraction 
of beinf readily exlandable, because Ihe addition ol airtra Fil(erln£ criteria simply leads lo the 
reje-ctiort ol more of the candidate* at 3 given node. 

All Of lha tillering criteria described above are local in the computational sense 
that they do net depend On Lhe result* of subsequent higher-level prote«e?. But (hi? dees 
not mean |h»t the criteria are spatially local. For example, which- of the two segments 2 or 3 
should be joined *iilh segment I in figure Ue? Pte preference enists On purely local grounds, 
bul a decided preference arises 1rom the closure property of the whole figure. Oily a limited 
degree OF sensitivity to cQnno^lc-dress appears to be present in human visual systems 'MinsKy 
£ Papert 1969 p,73), but It is hot hard lo devise a detection scheme thai would Operale 
tufficranlly well to help in decoding many im&eej, while falling to provide a complete 
sensitivity to- connectedness. 

A detailed account of the selection criteria Ihat appear lo be uselul win be 
given- in a separate article, but il Ihesa methods are taken as a theory OF part Of our own 
visual systems, the^e is one consequence thai would Follow from even I ho sfcutchy eccOunl 
given here. IF it were true that most of the time, decisions about local groupings are taken 
Using criteria computed at roughly lha same stage Ol lhe analysis,, rattier than by oilerisivp 
us a- of downward-flowing internal ion, it should be possible to find i ma gas in which a 
particular grouping is greatly to he preferred on most of lha criteria described here, but 
which is nevertheless incorrect. Furthermore, If how -level decisions are indeed irrevocable {as 
the principle of toast cOiwtTenl asserts), I heir failure- should Cause severe damage to lhe 
perceptual analysis, ol an image, Occasionally, one FFnda a pholo&raph in which Ihe accidental 
alignment of contours causes this to happen, and figure 14 shows an image whose original is 
misinlerpreted the firs! time hy rtOut two people in three. The accidental alignmenl of I he 
foreffnger with the nose appears to be responsible for [he failure. It is interesting that one 
does not make the same mislake lha second time one views, the picture; and that in lha real 
world where stereo disparity and motion information are also available, one almost never fails 
al the same low level. 

Trammittitm &f u./\moI ard twin 

The next important consequence of lha principle of least tomrrt!trn*nl is that if 
no dear leader emerges from the group of centa"dmg possibilities, all possibilities lhal were 
nol rejected are accepted. Fv,'o arbitrary .choices arc made this early in. I he analysis. Modes al 
whi-ch an ambiguity e*isls a*e marked, and theTTtsetyes fo^m p ?w[ ol the informal ion that is 
sent lo the next stage In the processing. The reason for doing this is Ihat subsequent 
processes then have accest to whatever troiible-spot* etfst low-cr dov^n, ]n Ihe image PLANT, 
part of Ihe nearer leal happens lo have the same intensity as its background. Table la shows 
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the actual Intensity values in (he rectangle (34 t 37) to (49 h 53V *nd table lb shows the 
approximate edge-maiX convolution values there, Altheu&h soma Intensity changes do ejfisl 
above this area (neap (44, &&)), they are insufficiently distinguished to allow the grouping 
methods described above to separate. the lwa leaves. Accordingly, all Of the segments aire 
included in one forrn, shown together with the segments it contain? In figure 13e. (It has 
been separated manually frflm 1he stem, for r,l,irily), 

[f the nodes that support this figure are maintained and can be influenced, by 
subsequent processes., the amount at information needed 10 separate the two reaves, is very 
small. For example, cms decision can sulFkei' if it is. asserted that sagrnertt (1) tfoes not match 
segment (£}, this information is sufficient lo allow the aggregation lifter netwflTk to decompose 
the inraga inlo Ihe two parti shown in I3f and 13g. So although some downward-flowing 
information is needed here, "he amount required is small provided that ■! is applied 1 &c- a* to 
Use thiB partial results obtained at the lower ievel 

Thrlnifrgrvffntinn, 

The techniques described above group items that possess an. intrinsic 
orientation (or acquire one early |n the processing), in a direction that approximator their 
local orientation. Theta-aggr^gation is the name we have given 1d Ihe process of grouping ? 
set dE similarly oriented items in a direction thai ditfers IrOm their intrinsic orientation, hul Tn 
a manner which u-ies il (e.g. figure Ha). The technique Is to use very local grouping 

■ measures to (firm place-tokens that have an orientation associated with the group rather than 
with) the local elements, and then to apply curvilinear aggregation la these tokens. The 
difllcult part about It is that measures of Ihe 'overlap' of lwo neighboring oriented items. 
depend upon Ihe angle, the Is, that Ihe aggregate mates with *ath ieeal unit (see figure 15). 

■So thata determines Ihe aggregation process, but also depends upon il. For good data, it may 
be quite unnecessary Id know theta; segregation of the places thai each individual etement 
defines will suffice to compule the aggregate. ]n general however, one will need to lake into 
account ihe relation between lha Overal. direction of Hie i-ggreg-sle *nd the orianlalicn of (be 
local elements, Viewed from a very abstract level, (his. cnmpulahon may be regarded as a 
process of sO'virg a large number of rather simple equations. 

-'■■'•■' "■ ' ':■. 
. ■ 

Grouping inte rtrigfib&rfiQt?4t rmd rtgittm 

The second Category Of grouping operat : ons concerns Ihe setection of * region 
by Ihe presence there 0-f some distinguishing loc*l properly. We firs! examine Ihe nature of 
the locali properties On which such grouping operations are based, and secondly we make 
some brief comments about the grouping techniques thai operale cm Them. 
Semf-ieeaE mtauwts. From an abstract point ol view, lha primal sketch is simply » large- tjody 
of data. The^o is therefore no difficulty in eidrscling From il ceH^in Tenures and statistics,, 
computed IrOrn the parameters that are bound to Ihe elements of the sketch, Such measures 
provide * useful coarse description ef a neighborhood in the image. They can be used to 
control the type and deplh of. the analysis that 1* applied Id a regiOn h 0* to select 
neighbourhoods (Or subsequent grouping into regions. !.- par^cular, we shall Msume thai 



over moderetely sized regions (05 lo 1.0 degrees it fo^eal resolution) of th* primal sketch, 
the following distributions are available) 1o processes that are capable or asking certain 
st^a ghHor-vard slalislical q.est cms cF IheT: 

00. The tolel amount of contour, and number of bfobs f at diiforcnt contrasts and intensities, 
DJ. GRIENTATtQN: Ihe total number of elements at each, orientation, and Ihe total contour 
length at <!«h Orientation, 

02. SIZE: distribution of the siie parameters defined in the primal sketch. 
D3, COrTTRAST; distribution of Ihe cgnFr^st of items in the primal skrlcb. 
Dfl, SPATIAL DENSITY; spatial density of place-tokens defined in the different possible ways, 
measured' using a small selection of neighborhood sizes. 

The straightlorward statistical questions referred to above include such matters as whether 
the distribution Is uniform, or has- one, two, three or mora peaksj It peaks e«lst, where Ihey 
*re and their rel-alive sizes. If Ihe distributions are very scattered (like orientalism 
distributions), the corresponding questions are whelber Ihe orientations are grouped in £ 
significant way, or are roughly unilormly spread Out". It has been our Experience that 
straightforward histogram-based selection techniques suffice to drive Ihe initial examinatton of 
an image. For example, to e-KSmine the characteristics of the Or io nisi ion distribution in an 
Image, One fO'ms in orientation histogram based on ten degree vide orifcnF-alion buckets. The 
figure of ten degrees Was obtained empirically, and appears to be suitable for all images. For 
spatial grouping On the othur hand,, the Scale at which one applies histogram-based techniques 
depends upon the place-token density ol the particular image being analysed. Once- again, w« 
have not found i1 desir.abia lo use elaborate statistical tests. If a property is signillearil, any 
reasonable test would detect it. IF a property is marginal, no statistical model can alter the 
f«l. 

The final facility that we rebuke t? the ability 10 select front the itfa&e those 
areas Of items that gi^e rise to obvious (eatwes ol these distributions. For example, in figure 
IB items at an orientation nl 60 degrees are strongly predominant. We assume that item* at 
■bout this orientation can be selected Irom th* primal sketch For examination fey proce::^*. 
1-pifliT specialize in grouping such collections together. In another image, one might wish lo 
examine first all those items whose contrast was greater than a certain value, These faciliti.es 
•re used only when tests indicate that Ihey should be, encf they Can help the analysis of an 
image by greatly restricting the number of elements in the primal skelch that need lo be 
considered by a particular process. 

fivundaty of a group of bhce-Htktmiu The distributions DO - OS, ind Ibe density of place- 
tokens obtained from items in the primal sketch, can lead to the Splitting of an image into 
regions. The cenlres of figures 10a & IQb provide simple examples ol this ivn ;:lio Jules? 
1971 pplOSff). O^CaMaghan (1974a A b; surveyed the lite^alu-e On dot -grouping studies, and 
defined a local operator l*r obtaining boundary Fines ot clusters of doty. The idea is that Ihe 
shapo and e*ten| of Ihe clusters are subsequently computed From Ihe local. boundary 
elements, 

Our experience has been Ibat pureFy local' methods can usually be improved by 
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adding to (hen a sens Tvlly to the ~oyer.aiP direction of a bowery, The Inlvr.-istictn. between 

local and global rntormalioin resemble* that shown in figure 1?. The overall direcllon of a 

group Of place-tokens tan be Obtained cheaply by finding pea** in their spatial density or its 

gradient. $uch 9 mechanism aNows One Id obtain an overall description of the shape Or- 

Orientation of a group of. piates btfor* a precise assignment o( local boundary points has 

been Wide, This Fs relatively easy to imptement, and it has (he arJuanlagej of speed and 

economy tllat lead nna ta expect ft in our OwnvisuaE system*, 

.■ 

T 

Relation tit H;rritnj-ui.iiim dijerimirtnliitH 

There are several current ideas on texlure processing. Some authors have 
used Fourier techniques, and in certain c-ircufflstinces the spatial power spectrum can' 
successfully separate different regions (.Bajc-jsy 19721 Others have constructed 1 specialized 
operators. whith sometimes discriminate between region? ^fctb different texture. Prcb&bly the 
e»rlieit example ol this was lhe Bob-arts jradiertt (Roberts 1963). The most Interring and 
cgmp.re-hens.rve propOsar is due In Julesz, (JuleSI (19&2), Jut-osc, Frisch, Gilbert & Shepp 
(1973), Jules? ft 975)}, who showed 1hal visual textures thai difler only in their third OF 
higher Order statistical structure are rarely perceptually discriminablei whereas visual- 
textures that dilFer in (heir first or SGtOhd- Order statistics can usually be distinguished. The 
important point about this finding lies in i.ls demOnstralion of lhe essential simplicity of 
texture dFscnrninatlons. Although if gives tittle insight inlo how lhe processing F; implemented, 
i! coes, i -r p:.l >■ Ihnt wilh VQ terra series s * p a n ;.ic n, a"l cc s* \k is n" :'. 0! le-rr. *l-n>! y.-n-rlr.' k 
higher than 2 are zero. 

The present theory includes teHure discrimination with the other technFques 
for extracting, forms Florin the primal sketch, and aspects that torture discn mi nations are 
actually Implemented by the family of lirst-order discriminatronfi and E*ui.-ping processes; that 
act upcirt the primal sketch. The class Of computations that Ihese processes define differs 
from but overlaps consider abty with the tl^s q\ all s^ond-Order Operator.? cperaling on the 
Original intEnsMy array. Juiesi f(975 p<3) mentioned In an as-icte lhe possibility that texture 
Vision may rest on fir^T-order statistics of various simple (nature enactors", but this; idea 
requires the concepts of the primal sketch and of recursively applred grouping before it can 
be brought to fruilion. The prir*cipfe difference belween the two approaches is that the ■ 
present theory is process-oriented 1 , since It retts on the betirf t'-nt narly processing of visual 
In-formation is In fact Fmplemenled in this way. The second-order discrimination theory 
provides a pheno men logic at description. As wilh many other problems ol biological 
Information processing, it wiN be interesting 1o see whether the phenomenology can bef 
described accurately without explicitly defining lhe underlying computational processes. 

$0 that the reader may Form an intuitive grasp of the way in which |h& pre-sont 
theory accounts for texture vision discriminations, let LB re-c * ;i mir ~ some- of the toxluros 
devised by Julesz, and follow this with some examples ol the lecture analysis run on &omp 
nalurat images. Firstly, consider Figure 16. Julesz notes that in 16a, the two regions have 
distinct second-Order statistics, but not in figure tub. hie nee, according lo his rule r the two 
regions are distinju^h^hk in 3 &a, but not in 16b, The present theory explains this « 
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FIGURE 16*. Examples of textures devised by Jul?sz. Ail four contain a SCUDTO region that 
differs from the background, (aj ard [b) obey JUesz T S conjecture; in it}, the secQndhfrrder 
tTi!isMv?l structure of the square differs From that of Mie background, yet we c#nno4 
distinguish the t*0. Irt (d>, 1h.B secflnd -Order -Btructure is un-iform h yet we can faintly 
diilinsulah the sqyare- reeion, The- present tlrsftery «eounts for Ihcsc <jx3mples h and defines a 
set Of discriminatLDn& tb»t rteifhe-r contains nor is contained by the se-t of «L| se^ond-ordflr 
discriminations. 



FIGURE 17. Th* spah'sl inFor m^i^n- or the primal ikelch of th» image CHAtH (figure 4a) is 
shown in (aX (b} and {c] show (wo Unils Ihat emerge alfer aggregation*, and (d) gi^s Iba 
skeleton aP the chair Id which this a-ggre-gBfian leadi. (This skeleton w?s obUlrwd by 
selecting I he tangos! edge From each aggregate, and Hddina 1he etfja whose tenler l»S at C30. 
67». By using 1he texture ThwsT is present in tha imaja^ the problem of diving; the ih*#*- 
di:m«ri$JQnar shape dF Ihe object has teen separaled from the- problem of recognizing Its 
surface slructure ion* takes (d} a>s its data, I he- other lakes units KM (h) or {t}\. No 
dowmward-1 lowing inFarmalion *js necessary to accomplish this. 
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follows: orientation Measures are (he only distinguishing feature ol Ihe prlntal sketch 
representation, because everything else has carefully been held cOnstaritr In |6bj. (he two- 
baste etemenls are retated by a tSQ degree rotation, and so Ihe Orientation st*Mitics la- which 
Ihoy give risai aire identical Harce Ihe 1wo regions ara rndis1inguish*fele. In 16a however, 
then is n-.aro cOnlour at decrees than at 95 degrees in the central pitch, but [he Opposite 
I* true in Ihe surround. Mcncc Cho t*0 regions afr tmmpdi*teliy distinguished. 

The second example appears 49 figure i&C. Some erf Ihe modules in (he pattern 
have bean reflected about » vertical line thrquah their centers. Their second-order statistics 
are therefore different. This is an example En which Ju'esz's generalization fpH'Sr The 
orFentatlon statistics of Ihe contour*, and ol th* local E'ciups lhey lorm, are however 
unchanged because only vertical and horiionlal oriental ians are involved. Hence the present 
theory predicls that Ihe two regions shculd be indistinguishable without scrutiny, as indeed 
they are. This establishes that the class o( second Order- discriminations includes sOm* 
Operations that aro net included in tha ciass defined here. 

The aggregation lechnlque that was illustrated in Figure 12 provides an example 
oF a technique whose comolexily is higher than secOnd-O^dec Discrimination oT Ihe 
distinguished region can just bo made in Figure L6d„ and the reason seams to be that tha dols 
'string, together" belter there than in 1Kb background. This would be an unusual use ol the 
aggregation techniques, but it does allow* US to- distinguish I ha region From its surround frven 
though the setOnd-Qrdr-r stalrsliesl structures cF "he two are idenlicaf, II does not however 
' al'ow us Iq be confident of Ihe exact boundaries. 

Example* of lbs finalyju 0/ inmti rr.al itnagM 

J-n order lo i| uslra-ta tha usefulness of the theory, we shall now ex amine the 
reSLllts Of applying it "to some images, figure 1 7a shows Ihe primal sketch ol the chair whose 
image appeared » figure 4a. The firsl thing la realize aboul this ihmge is (hat i| is textured 
•t ell. The texture is SO simple lhat one easiry Overlooks it, yet it exists in exactly the sense 
ef this article The presence of lha texture is suggested by Ihe existence of Ihree rle^r 
peaks In the orientation histogram, and Ihe teilure itself is decoded by grouping nearby items 
with similar orienlatiOns. Figures 1 7b - ft C show typical res ul Is of running Ibis procedure on 
this image. 

Each 6( these aggregates Can now be described simply by position, oHent*tion 
and extent, and this produces a skeleton of tha. outline of lha chair (figure I 7dJ, " By 
considering separately Ihe structure of 'Jus! one aggregate* one could go OH to compute a 
description ol the surUce structure of the material Out or which the cha»lr it made. Using One 
autonomous technique, we have separated {but' not of course solved) the problem Of divining 
the overall thrte-dirp+nslonal shape of Ihe chair (rem the flnatyses of Its surface properties. 
This ability is vi|al If the organisation Of subsequent pnalyais Is to ba modular. 

The nex! example shows a diMicuH case of theta-aggrega-tion. The image 
{figure 4dJ- is not very COntrpsly because It was taken Irom a photograph (Brodalz 196& plale 
DJ I J. The intensity values have been printed in iigurc IBa, and figure ISb shows the spatial 
rOmponanl of the primal sketch. Contours Of all inlc-ry-.ilk"., tenglhs and orientations, are 
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shown, and as One would eypect from an imago of this eQmple)fity r IBb has a somewhat messy 
appearance. Part Df I ha mess can be removed by excising ere men ts responsible for th* 
Iowest-c0nlrast peak in (-ho contrast-distribution histogram, but the crucial clues come- from 
the- Orientation drstrlbutip". Table 2 provides rough informalion about the amount at contour 
that rs present at each orientation, from nvhich it Is evident that items at an orientation of 
around 60 degrees predominate, The- average Fenglh of items at this orientalion is 13. These 
coarse measures cause the lemurs analyzer to attempl la group lha edges «t this cr *. t-.lrf cr. 
Initially^ (he direction along which grouping should take place Is unknown, SO stringent local 
grouping parameters are used. This leads lo the primary ctuster shown in figure 18c. From ' 
this, an ouerilt direction is. obtained (-53 degs} p and curvilinear aggregation then groups, the 
items into the strip** shown In ISd, e,. f , g ffi h. This completes primary texture processing. 
Once Che primary stripes have been obtained, the same analysis operating recursively an 
tokens tor these siripes serveMo relate lhem lo One-anOther. Notice lha) in this particular 
image, some Of (he si ripe information has been picked up directly IrOm the intensity values 
(see figure ISbK This would nol be .true of a mere her ring-boi**' text urc t and the analysis 
does not depend Upon it, Our present system is successful at processing, herring-bone 
textures of similar complexity in which the Iwo types of stripe have the same average- 
reflectance, Figure 19 demons! rates this. El shows the analysis ol figure %, which is a 
fragment Ol Dr. Eric Sande wait's waistcoat. 

Finally, I give two Bjc&mplei of image* that are simple enough tor thfr 
aggregation techniques to exlract the impo'tanl forms unaided. The local elements ol the 
primal sketch ot the rod o! figure f> arc grouped by the Mrs! two stages of curvilinear 
aggregation into the urvils shown in figures ZQn, b & c, The third stage assembles the^n into 
the form shown in 2Qd, The reason why the first two stages. cannot cOmplele the job is 
because of the alternatives near (33, fiCfy and because the conlrast across the top-left porlion 
of the form has the Opposite sign from the contrast elsewhere. 

Several types OF analysis have been applied to the image OF a toy bear < figure 
21}. Ttie half-tone intake (figure it} has be-en printed In 21a, and the intensity map is given Jn 
21b. The primal sketch of Ihis image is represented hy 2tt The blobs exlracttd from this 
image appear in figure 2 Id, and the routines for describing the spatial disposition of a small 
number Of plates recognize- tl^at these t-Orm a f¥EE FLAT} configuration (cr. figure lib), 
described relative to th*= default vertical a^s. The colours that lorm the bear's face appear 
in 21e, and 21( shows h?s muzzle. The extraction of the muzzle made use ol the closed form 
proper! y f as welt as discrepancies in contrast and Fulness, white choosing between rlv.il 
segment's near coordinate (%$, £5J- 

Jli'jciij^ppn 
Perhaps the most novel aspect 'of. these ideas Is the notion thai the primal 

sketch exists as a distinct a-d circumscribed symbolic enlily, computed autonomously from I be 
image, and Operated on repeatedly by a number oF local geometrical processes, semi -local 
rreasures, and lirst-Order d scriminations. The underlying reason why One needs to compute 
such a thing is 1 hat in some sense a description Tike Ihe primal sketch is much closer to what 
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TABLE 2. Frrei-nrde." measures taken Over the primal sketch can central 

the execution of grouping techniques* This table shews the 

orientation statistics of the prihal sketch shown In figure 18. Far the 

purpose of i- I I uStrat ion, the orientat icn=. hs^e- been 

divided Into disjoint buckets 15 degrees wide, and ths total amount 

of contour ar.d number of prim*! Bketch elements are shown for 

each of these buckets. Any criterion would judge £0 degrees to be Aft important 

Of i entat ion* The processor therefore tries to group contours having this 

orientation* 



ORIENTATION B 15 3fi 45 50 75 30 185 120 135 150 1GS 

(degrees] 

NUHBER OF S<* 7 14 1G 1G1 27 42 15 25 2S 34 1G 

ITERS 



TOTAL CONTOUR E32 G4 132 116 2213 116 680 11& 13* 3«4 331 13& 
LENGTH 



is really thore &>, eHa-nges in reflectance} tharr lha values of ed[&*-&rijysd 6r bar-shaped 
IMSk Convolution*,, wblcli ior.-n 9 l&r(je and confusing ss| of primary nreasUfe^rrts. J[ would 
be almost impoi?.|bto to de-il u!1h so huge a mass - of dal? un f 3*? it were l-irst prginijed into a 
rewfaW * format, 

. The fito-rB&e Into which lha prijnal sfcelcr: l*. vrr\\\zn 13 (h- dirfett analo™ For the 
cl*S5 of image* sludied h&.'« el the CydopeaT re^rta that Julesz (1971) wrote olFcr binocular' 
vision, hfora subjjoclivDiy, v,hnt il holds corresponds very elojitly'to She Ira ago - Thai one is 
conscious oF r Th^ reF'ectt lh= tfl-'ppulatiQnaS hypothesis thai at] subsequent analysis reads 
the primal stietch no! the data tram which it w H -is computed. The prims' sketch therefore ncW 
in 9 £enu>na sens* as Ihe interlace it which wFsuhI analysis becomes ? pUrgly symbolic alfalr. 

Implivntivnt /#f JiecirtijjA-j.fccol'rcjj' 

TFia Images shsdr-frd here are impo-Yerished by their inherent lack of movement 
or binocular disparity. Ex1re«ne caution 15 needed when al temp I i nri 1c make predictions From- 
such a- theory, because of the p-owtr of Hies* Iwa type* ft! infftrmplion, For example,, a linear 
cell with a Ccller-surround receptive field is a floppiest hl-oh-ass-trtrjj on it? Own. The lrOg h s 
fly *c alerting tystjunt Only works becaii*^ Ihs arJdrtiD??jl constraint 0! relive- mollon is added 
(Barlow J 953, Lellvirc *js oL iSqS}. Movement information together with sumo extra circuitry 
misfit oven Sum a linear sample tall wi|h a. fnr-shjipid rew-ptive held into a- passable delector. 
Of bars in an i.^age. EM a simplistic scFwrrs of this sort, though' possibly acceptable to a cal > 
would be Of little usu Tor deciphering a mots0n'?i*5 senna. It is therefore reason able to evpect 
thai Eomethmg like a primal ik*tth i* computed, at tae*l by the hi-hw primates. If it is, the 
celts Ibal represent the primal sketch should exrnblt the cOnS^qU'ifir*? OF plgor^h-ns: like 
peak -matching, fhe selection criterion, and Ihfr (ol h^f-wl 5 c surprising) inter -orientation 
interactions that are central to its construction. Oil* would al*o evpr^t grouping processes 
that use disparity or molicn i." Fa : riV n." So I slue as "1helr Inpul |he primal skntch and ;tt toast 
some qF |hs ttas&es of tokens ob'aippd Erom it {Mpi'I- \9?A\ 

At a hlghor tuVel h Oh-9 would Bxp^ct Id Find experimental evidence of tho 
aggregation processes [hat Ihe theory predicts should act upon the primal sketch to 
decompose if inlo unit farms. Some of Iheie praces$.a$ h*u?. natural hOUral representations. 
and same do not. For curvitir-sar and rhela-?ggreg.atrciri p one wou'tf eKpttt Id fin^ a cell lhji| 
marKs Ebe overall direction ol aggroEaSion independently of the orianlstion of the local 
Element. One wOu'd slso aicpect to find cells that represent place-tokens {recognizable .by 
tfi&lr insstisilivity to what is at the pr«&)j and cells for carrying ejects of the local first— 
Crdar and spalial-dansity msasurus that are i important lor lejclure-basad definition of reejons. 
Tha design of the most likely neural representation OF these processes is not straightforward. 

77rr* infiutnze */ tiifftrcf"titic t ktnnAcd$£ and #/ pufpatt an 
I'ifutil infort.-mtlwh pftJC#JJiitJF 
There are two broader implicalians oF the theory thai are worth menlion>ng. 
Firstly, the four prfncip'es staled at the beginning ol the article have survived intact, and 
their guidance has been valuable. The principle of k^t commitment has ptayctf ^n eSpctiallji 
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FIGURE 1$, Th» ftftaiysis of Shis herringbone pattern (f^i.r- &.?.'; d ■.- v o n 5 1 .' a 1 e s that the 
methods for distinsuisW* '*'* texlure regions do not depe-rtd ort their hiving diffprcnl 
averse reflectance*, (aj i:icr*s the printed image, and (b) the spatiat cOnjrtntflt of ths 
ariniil r;kckh. Typical extracted Gripes *rc &howi in <c> and (d), 




i — r 



t — i — i — s — r-^i — r 




FIGURE 20. The firit 1wO stajes of curvi!i«*f assre^atiori have been, run on the primal sketch 
Of the rod snown in figure 7 t and thsy produced the element* {a), (b) and (c>. GrtM la^if 
units have been obta.ned, tin.: aovtrrtins, paramelers can be relaxed f and I he elliptical form (d) 
ii obtained be the third step. Up lo thiE pOinl, Ihe system has rt*ith*r computed nor used any 
descriplor *t the form's- overall shape, 



FIGURE 2L The image of a toy bear (figure flf) ha 5 been prinled in (a), and its Intensity map 
appears in fb). The spatial cQmperiunl oF the prlmaJ sketch is illustrated in (cJ- The threw 
principal form* extracted from (cl appear in (d)> (e| and ill The ilems in (d> are classed as 
BLOB'S, and the tDJirjfuraNOn 5ha! they lo r m 1$ recognized ai a VfE (figure lib) yr'tlh mfldilier 
FLAT. The axis JeUlivc to which this configuration was computed is the vertical fcteljult 
value). The Outiins- of the bear (e), and of hi*. muzzle (f)- are simple enough fa have bean 
extracted using onJy the techniques descrihed in this arlicle. The closed farm property was 
used to help decide between campetijig segments i\ coordinate (BQ, GS}, {The veMital 
appears as the negative sr ants, because 1his imag« was, laXen wilh the camera on its side). 
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important role, by Its pressure On us to design a system (hat does riot Usually do anything 
wranj. It earned us lo abandon ideas shout "(."igg^i- te»tur*f j" in favour of 1he compulsion ot 
b "Irue* destHptifm, which ted In turn to the £f<irfual elucidation of the prtict^s*-. (but pr« 
necesiary (o rt^d it-, The result is bulky ralhsr (ban comptay, and r^^uires- prodigal's 
computing power but trttte computing BophistFcallam {it could be implemented without difficulty 
in a slackness mathin^ There tan how?veT be no doubt (hat in terms of- sheer processing 
power! (he human- visual system must he spfrEUculaHy well-fiftdo'WHd, 

The second implication $1 |n|ei"e$l concern*- the structure cl subsequent 
recognition processes. IF non-aticnlko vision paj> be implemented a licc e s& Ful I y by 
approximately the sel of methods- dt-Finsd in lhi= arlide, Ft means that vFsu&l "Forms" 'can: 
Usually be oxfracted from the ima^a by us.in£ kttowredae-free techniques. In other words, |be 
extraction of a visual form can usually ^ mf ^. it% oV script Ion, From this it follows that 1 1 Is 
Usually easy to ccr-pule & nmrre aVjn.n'fiji^n c-F a form heforc having iiiy Idea aboul whaf (he 
form is. 

]f this Is truvp if greatly simplifies (be design of subsoriuimt recognition 
prpcetaes, becsijSy E( .means that they tea tan ho made modular, For csa^-jf^ Ths at;i!rSy (o 
eornfiySo a coarse description of a- farm allocs One to describe the shapa of a forcsl without 
lirst conpuling detailed descriptions of fill |he trees; or to compute the stupe of the cluster 
of blah a-- that forms a distant vitlaj-s independently oi deciding thai some of (hose blabs are 
actually buildings and .th.it the cluster Is therefore a village. In the more mundane example of 
figure Zip One cam cOrnpule tha( the Ov^i^ll sbEU* ti1 Ihy top Form rs roughly ovOrdat without 
first having to segmenl out and describe. separably 3he btnrps thai are Ihe boar's cars, The 
autonomy of early visual procesiirs permits the role of higher level knowledge to be vsry 
restrletadj and different in kind from its intervention in programs like Ehirai's {19"/3). 
Downward -flowing information will not aFtecl the IFna-linding stage [tha computation o( ths 
prim-sl sktteh) a-t &IF, Its most usual tnaiui apcrtzitdi is in choosing which processes are lo be 
used to read the primal sketch - lor example by specifying which lecture predicat" sboiiM br- 
USed on the lma£a to selert lh« parts tiF current interest. It can also apply fcerlaln limited 
Kind; of fta°£ to critical segraenFa during their aggregation into forms. (as in the Innate PLANf} r 
Tfw coupling between hJ£her-leval knowledge and Iha form -extraction proces^o^ l& however 
much weaker |h;m thetouplirj between (he di1Fe r ?nf F'j'm-csilrDcliOn- prOccSs^^ 

It is clearly desirable lo have sd.iie control over which ot the possible forms En 
s figure should be osHv*re(f at a glv^n momsnt from Ihe primal sketch. For exampl5 P in fhe 
(moos BEAR there are three possible major forms; thft outline ol the tlffad, ttie mJz?Fe ± and 
the three blobs thai represent his eyes antf nDse. It seams probeble that only one of these 
should be made Available at t lima-j and thFs in I urn raises interesting quasi r-Ons about the 
Order in which it is done, the way in which lha three forms and their relative pfliltlons are 
described^ and the Way in wh^h thpsfl descriplions trigger a larger datastructure and *re 
absorbed by it. In living sysltrfis.j which are power Ful enough to Operate lo rca-1 lime, the 
control of the direction OF gaza may bo rallier closely ret^t^d la the Order in which these 
evank take place. 



Ac If n# vole dg*m tni i: This- study would not hay* tfft-n pe$*il)V "without I ho *cJv*rteud" and 
fid vi b l# computing f-sciUtiB-s that are available- a1 the Artificial Intelligence Laboratory, I thank 
Prt)(s. M. Winsky »nd £, Pap«rt for mviting me 10 the labara-tnry; Hatsan Alan r Gary Dudley, 
and Especially Ken Fortius 'or tKOern-nmrng assistance; Dr, &. Mesz and Pion Publications lor 
permission to reproduce ligtire 15; and Karen PrandBrgMl ior preparing the drawings, the 
material described irs this paper covers MJX A,l lab, K%mfls 3£4 £ 334. This wflrK *?s 
conducted at the Artificial Intelligence Liboralory, * Wassaehu&sllfi Institute of Technology 
research program supported in part by the Advanced Research Projects Agency oF |he 
Department cf Defense, and rncnilored by the DMice oF Naval Research under Contract number 
14000 14-75 -C-0643. 
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